{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Ensure that we have Tensorflow 1.13 installed.\n",
    "!pip3 freeze | grep tensorflow==1.13.1 || pip3 install tensorflow==1.13.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "\n",
    "tf.enable_eager_execution()\n",
    "tf.logging.set_verbosity(tf.logging.ERROR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Intro\n",
    "\n",
    "The `tf.feature_column` package provides several options for encoding categorical data. This mini-lab gives you an oppurtunity to explore and understand these options."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Toy Features Dictionary\n",
    "\n",
    "features = {\"sq_footage\": [ 1000, 2000, 3000, 4000, 5000],\n",
    "            \"house_type\":       [\"house\", \"house\", \"apt\", \"apt\", \"townhouse\"]}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Feature Column Definition\n",
    "\n",
    "We have one continuous feature and one categorical feature.\n",
    "\n",
    "Note that the category 'townhouse' is outside of our vocabulary list (OOV for short)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feat_cols = [\n",
    "    tf.feature_column.numeric_column('sq_footage'),\n",
    "    tf.feature_column.indicator_column(\n",
    "        tf.feature_column.categorical_column_with_vocabulary_list(\n",
    "            'house_type',['house','apt']\n",
    "        ))\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Inspect Transformed Data\n",
    "\n",
    "This is what would be input to your model would be after the features are transformed by the feature column specification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tf.feature_column.input_layer(features,feat_cols)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excercise 1\n",
    "\n",
    "What is the current encoding behavior for the OOV value?\n",
    "\n",
    "Modify the feature column to have OOV values default to the 'house' category."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feat_cols = [\n",
    "    tf.feature_column.numeric_column('sq_footage'),\n",
    "    tf.feature_column.indicator_column(\n",
    "        tf.feature_column.categorical_column_with_vocabulary_list(\n",
    "            #TODO\n",
    "        ))\n",
    "]\n",
    "\n",
    "tf.feature_column.input_layer(features,feat_cols)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excercise 2\n",
    "\n",
    "Now modify the feature column to have OOV values be assigned to a separate 'catch-all' category.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feat_cols = [\n",
    "    tf.feature_column.numeric_column('sq_footage'),\n",
    "    tf.feature_column.indicator_column(\n",
    "        tf.feature_column.categorical_column_with_vocabulary_list(\n",
    "            #TODO\n",
    "        ))\n",
    "]\n",
    "\n",
    "tf.feature_column.input_layer(features,feat_cols)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Excercise 3\n",
    "\n",
    "Assume we didn't have a vocabulary list available. Modify the feature column to one-hot encode house type based on a hash function.\n",
    "\n",
    "What is the minimum hash size to ensure no collisions?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "feat_cols = [\n",
    "    tf.feature_column.numeric_column('sq_footage'),\n",
    "    tf.feature_column.indicator_column(\n",
    "        tf.feature_column.categorical_column_with_hash_bucket(\n",
    "            #TODO\n",
    "        ))\n",
    "]\n",
    "\n",
    "tf.feature_column.input_layer(features,feat_cols)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
